Creating a Handwriting Recognition Corpus for Bushman Languages

نویسندگان

  • Kyle Williams
  • Hussein Suleman
چکیده

Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Texts

The Bleek and Lloyd Collection contains notebooks that document the tradition, language and culture of the Bushman people who lived in South Africa in the late 19th century. Transcriptions of these notebooks would allow for the provision of services such as textbased search and text-to-speech. However, these notebooks are currently only available in the form of digital scans and the manual crea...

متن کامل

Corpus and Evaluation of Handwriting Recognition of Historical Genealogical Records

Over the last few decades, significant strides have been made in handwriting recognition (HR), which is the automatic transcription of handwritten documents. HR often focuses on modern handwritten material, but in the electronic age, the volume of handwritten material is rapidly declining. However, we believe HR is on the verge of having major application to historical record collections. In re...

متن کامل

Evaluation of Handwriting Recognition Systems for Application to Historical Records

In the last decade, significant, largely-governmental funding has been applied to the automatic transcription of handwritten documents. Uses for this kind of technology are somewhat limited given that the numbers of handwritten documents are on the decline. However, certain types of handwritten historical records can be crucial for genealogical research in that they identify key vital facts. In...

متن کامل

Atwell 96 a

Geoffrey Leech’s ideas have been inspirational both to Corpus-based computational linguists in general, and to me personally: first as a student and Researcher Associate at Lancaster University, then as a Lecturer in Artificial Intelligence at Leeds University. This chapter focuses on research at Lancaster and Leeds building on Geoffrey Leech’s ideas, looking in particular at how corpus resourc...

متن کامل

Recognition of Myanmar Handwriting Text Based on Hidden Markov Model

Handwriting recognition is one of the most challenging tasks and exciting areas of research in computer vision. Numerous document recognition methods have been proposed in various languages and character set such as Arabic, India, Korean, Japanese, Chinese and so on. This paper presents the recent result of the research work of Myanmar handwriting text recognition and translation. Each segmente...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011